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Abstract — This paper examines synchronization of computer 
clocks connected via a data network and proposes a skewless 
algorithm to synchronize them. Unlike current solutions, which 
either estimate and compensate the frequency difference (skew) 
among clocks or introduce offset corrections that can generate 
jitter and possibly even backward jumps, our algorithm achieves 
synchronization without these problems. We first analyze the 
convergence property of the algorithm and provide necessary 
and sufficient conditions on the parameters to guarantee syn- 
chronization. We then implement our solution on a cluster of 
IBM BladeCenter servers running Linux and experimentally 
study its performance. In particular, through both analytical and 
experimental results, we show that our algorithm can converge 
even in the presence of timing loops. This marks a clear contrast 
with current standards such as NTP and PTP, where timing 
loops are specifically avoided. Furthermore, timing loops can 
even be beneficial in our scheme. For example, it is demonstrated 
that highly connected subnetworks can collectively outperform 
individual clients when the time source has large jitter. 

I. Introduction 

Keeping consistent time among different nodes in a network 
is a basic requirement of many distributed applications. Their 
internal clocks are usually poor and tend to drift apart from 
each other, generating inconsistent time values. Network clock 
synchronization refers to the ability of these devices to correct 
their clocks to match a global reference of time, such as 
the Universal Coordinated Time (UTC), by performing time 
measurements through the network. For example, for the 
Internet, network clock synchronization has been a subject of 
research and several different protocols have been proposed 
[1H6]. 

These protocols are used in a wide variety of applications 
with diverse precision requirements. For example, the current 
standard for IP networks, NTP [1], provides a precision range 
that varies from 100/iS to a few milliseconds depending 
on the network environment. This is suitable for personal 
computers or applications that involve human interactions. On 
the other hand, the Precision Time Protocol (PTP) [2] and 
IBM Coordinated Cluster Time (CCT) solution [7] are able 
to achieve precision of the order of l/is which is suitable for 
distributed applications that require strong time consistency 
such as special-purpose industrial automation, network mea- 
surements and real-time financial transactions. 

There are two major difficulties that make the network clock 
synchronization problem challenging. First, the frequency of 
hardware clocks is sensitive to temperature and constantly 
varies. Second, the latency introduced by OS and network 
congestion delays result in errors in the time measurements. 



Thus, most protocols introduce different ways of estimating 
the frequency mismatch (skew) [8], [9] and measuring the 
time difference (offset) [10], [11]. In particular, the extensive 
literature on skew estimation [9], [12]— [14] for clock synchro- 
nization suggests that accurate clock synchronization can only 
be achieved if a precise estimation of the skew is obtained. 

This paper shows that focusing on skew estimation could 
be misleading. We provide a simple algorithm that is able to 
compensate the clock skew without any explicit estimation of 
it. Our algorithm only uses current offset information and an 
exponential average of the past offsets. Thus, it neither needs 
to store long offset history nor perform involved operations 
over them. We analyze the convergence of the algorithm and 
provide necessary and sufficient conditions for synchroniza- 
tion. The parameter values that guarantee synchronization 
depend on the interconnection topology, but there exists a 
subset of them that is independent of it and therefore of great 
practical interest. 

We also discover another rather surprising fact. A common 
practice in the clock synchronization community is to avoid 
timing loops in the network [1, p. 3] [2, p. 16, s. 6.2]. This is 
because it is thought that timing loops can induce instability as 
stated in [1]: "Drawing from the experience of the telephone 
industry, which learned such lessons at considerable cost, the 
subnet topology... must never be allowed to form a loop." 
Even though for some parameter values loops can produce 
instability, we show in this paper that a proper selection of 
them can guarantee convergence even in the presence of loops. 
Furthermore, we experimentally demonstrate in Section V that 
high connectivity between clients can actually help reduce the 
jitter of the synchronization error! 

The rest of the paper is organized as follows. In Section 

II we describe how clocks are actually implemented in com- 
puters and how different protocols discipline them. Section 

III motivates and describes our algorithm together with an 
intuitive explanation of why it works. In Section IV, we 
analyze the algorithm and determine the set of parameter 
values and connectivity schemes under which synchronization 
is guaranteed. Experimental results evaluating the performance 
of the algorithm are presented in Section V. We conclude in 
Section VI. 

II. Computer Clocks and Synchronization 

Most computer architectures keep track of time using a 
register that is periodically increased by either hardware or 
kernel's interrupt service routines (ISRs). On Linux platforms, 
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there are usually several different clock devices that can be 
selected as the clock source by changing the clocksource 
kernel parameter. 

One particular counter that has recently been used by several 
clock synchronization protocols [4], [7] is the Time Stamp 
Counter (TSC) that counts the number of CPU cycles since 
the last restart. The TSC is a 64-bit counter that has a 
nominal increment period of 5°. For example, in the IBM 
BladeCenter LS21 server, the CPU has a nominal frequency 
f° = 2399.711MHz which makes 5° = 0.416ns. 

Using the TSC counter, the server can compute its own 
estimate x(t) of the reference time t using the following 
equation 

x(t) = S°TSC(t) + x° (1) 

where TSC(t) is the counter's value at time t and x° is the 
estimate of the time when the server was turned on (t Q ). The 
value of the counter can be represented by TSC(t) = L^inJ 
where S is the actual CPU cycle period and |_-J is the floor 
operator. 

Since 5° = 0,426ns <C 1, S°TSC(t) is usually approxi- 
mated by S l^f-\ ~ r(t — t°). r := ^- represents the skew 
of the local clock with respect to its nominal value. When 
r > 1 (r < 1) the clock is running with frequency higher 
(lower) than f° = h. 
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(a) Offset between two TSC counters (b) Effect of adjtimex() on linux time 

Fig. 1: Comparison between two TSC counters and execution 
of adjtimex command 



In an ideal scenario, r equals 1. Unfortunately, the skew 
r varies due to several factors including room temperature, 
among others. This is shown in Figure la where we plot the 
offset variations between the TSC counters of servO and servl 
in our testbed (Figure 2) over more than two days. Therefore, 
it is desirable to compensate the frequency error and compute 
x(t) by introducing a skew correction s in (1). 

Thus, x(t) can be expressed as a linear function of the time 



x(t) = rs(t-t°)+. 



(2) 



This linear map shows explicitly what are the two fundamental 
unknowns in a clock synchronization problem (t° and r) and 
the two parameters that can be used to steer the clock (s and 

x°). 

To discipline x(t) towards t one needs to estimate the offset 
D x (t) = t — x(t) at time t° and the relative frequency error 
f err = In fact, if these values were available at startup 
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Fig. 2: Testbed of IBM BladeCenter blade servers 

(something that in practice is not true), then one could just set 
5 = 1 + f err and add an additional offset to (2) to get 

1 



x(t) 



1 



r 



(t-t°)+x° + D x (t°) 



l(t-t°)+x° + (t° -x°) = t. 



Unfortunately, these values are in general unknown and 
variable. Thus, f err (t) and D x (t) need to be repeatedly 
estimated while the server is running, which introduces several 
constraints on how the clock can be disciplined as it may affect 
the execution of time sensitive applications. 

To understand the differences between current protocols, 
we first rewrite the evolution of x(t) based only on the time 
instants t k in which the clock corrections are performed. We 
allow the skew correction s to vary over time, i.e. s(t k ), and 
write x(tk+i) as a function of x(t k ). Thus, we obtain 



x(tk+i) = x(t k ) + T k rs(t k ) 
s(tk+i) = s(tk) 



+u x (t k ) (3) 
+u s (t k ) (4) 



where r k = £&+i — t k is the time elapsed between adaptations; 
also known as poll interval [1]. 

The values u x (t k ) and u x (t k ) represent two different types 
of correction that a given protocol chooses to do at time t k and 
are usually implemented within the interval (t k ,t k +i). u x (t k ) 
is usually referred to as offset correction and u s (t k ) as skew 
correction. 

These corrections can be done in Linux OS using the 
adjtimex() interface. The commands 

adjtimex -o offset and adjtimex -f freq, 

where offset is in nanoseconds 1 (ns) and freq= 65536 equals 
lppm (parts per million), are equivalent to setting 

u x (t k ) = offset.le~ 9 s and u s (t k ) = (1 + (/re#/65536).le-6). 

Figure lb shows the execution of two offset corrections of 
+20/iS and -20/iS, and one frequency correction of approx 
0.3ppm. We used offset= ±20000 mdfreq= 20000. 

Some protocols prefer instead to maintain their own virtual 
version of x(t) as for example IBM CCT [7] and RAD- 
clock [4]. This gives more control on how the corrections are 
implemented since it does not depend on kernel's routines. 

^n some implementations adjtimex -o also changes the frequency of 
the linux time. In such cases one can perform only offset corrections by 
immediately executing adjtimex -f to correct this change 



(a) NTP initialization period (b) NTP in normal regime 

Fig. 3: Variations of NTP time using TSC as reference 

We now proceed to summarize the different types of adapta- 
tions implemented by current protocols. The main differences 
between them are whether they use offset corrections, skew 
corrections, or both, and whether they update using offset 
values D x (t k ) = t k — x(t k ), frequency errors f err (t k ) = 
^^i, or both. 

r ' 

A. Offset corrections 

This correction consists in using u s (t k ) = and either 

u x (t k ) = K 1 D x (t k ), or (5) 
u x (t k ) = K 1 D x (t k ) + K 2 f err (t k ), (6) 

where k\^k 2 > 0. These adaptations are used by NTPv3 [15] 
and NTPv4 [1] respectively under ordinary conditions. 

The protocols that use (5) or (6) have in general a slow 
initialization period as shown in Figure 3a. This is because 
the algorithm must first obtain a very accurate estimate of the 
initial frequency error f err (t°) and set s(t°) = 1 + f err (t°). 
Furthermore, these updates usually generate non-smooth time 
evolutions as in Figure 3b and should be done carefully since 
they might introduce backward jumps (x(t k +i) < x{t k )). 

B. Skew corrections 

Another alternative that avoids using steep changes in time 
was proposed in [7]. This alternative does not introduce any 
offset correction, i.e. u x {t k ) = 0, and updates the skew s(t k ) 
using 

u s (t k ) = K 1 D x (t k ) + K 2 f err (t k ) (7) 

In [16] it was shown for a slightly modified version of 
(7) (used rf err (t k ) instead of f err (t k )) that under certain 
conditions in the parameter values, the algorithm achieves 
synchronization for very diverse network architectures. 

However, the estimation of f err (t k ) is nontrivial as it is 
constantly changing with subsequent updates of s(t k ) and it 
usually involves sophisticated computations [8], [9]. 

C. Skew and offset corrections 

This type of correction allows dependence on only offset in- 
formation D x (t k ) as input to u x (t k ) and u s (t k ). For instance, 
in [5] the update 

u x (t k ) = KiD x (t k ) and u s (t k ) = K 2 D x (t k ) (8) 

as proposed. 

This option allows the system to achieve synchronization 
without any skew estimation. But the cost of achieving it is 
introducing offset corrections in x(t), incurring in the same 
disadvantages discussed in II- A 



Fig. 4: Unstable clock steering using only offset information 
(9) and stable clock steering based on exponential average 
compensation(ll) 

III. Skewless Network Synchronization 

We now present an algorithm that overcomes the limitations 
of the solutions described in Section II. In other words, our 
solution has the following two properties: 

1) Smoothness: The protocol does not introduce steep 
changes on the time value, i.e. u x (t k ) = 0. 

2) Skew independence: The protocol does not use skew 
information / err as input. 

After describing and motivating our algorithm, we show how 
the updating rule can be implemented in the context of a 
network environment. 

The motivation behind the proposed solution comes from 
trying to compensate the problem that arises when one tries 
to naively impose properties 1) and 2), i.e. using 

u x (t k )=0 and u*(t k ) = KD x (t k ). (9) 

Figure 4 shows that this type of clock corrections is unsta- 
ble; the offset D x (t k ) of the slave clock oscillates with an 
exponentially increasing amplitude. 

The oscillations in Figure 4 arise due to the fundamental 
limitations of using offset to update frequency. On the other 
hand, the exponential increase appears since at time t k+ i the 
update is based on the offset at time t k . Right before updating 
(at the actual value of the offset has a correction 

D x {tl +l )=D x {t k )+T k rr r {t k ) 

which after noticing that f err (t k ) = f err (t^ +1 ) amounts to 
an effective correction given by 

u s (t k ) = KD x (t k ) = KD x (t~ +1 ) - KT k rf err (t- +1 ). 

Thus, at the moment of the correction the offset used implicitly 
includes a negative term in the frequency error that hurts 
synchronization. This is clearly seen in the case of a slower 

_ l—rs(t~ ) 

slave clock f err (t^ +1 ) = fc+1 > with a positive offset 

DX ^k+i) = *ife+i - > °- while the first term of the 

correction tends to speed up the clock, the second term tends 

to slow it down. 

One way to try to damp these unstable oscillations is to add 

a term that opposes the frequency error term. This is done in 

(7) by making k 2 > &\T k r. However, there are other ways to 

generate such a term. For instance, consider the exponentially 

weighted moving average of the offset 

y(t k+1 ) = pD x (t k ) + (1 - p)y(t k ). (10) 
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and update s(t k ) using 

u s (t k ) = (K + 1 )D x (t k )- 1 y(t k ). 

If we again reference these values at the moment right 
before the correction we have 

u s (t k ) = KD*(t- +1 ) - (« + 7 )r fe r/ e -(t- +1 ) 

+ 7( J D x (t fc - +1 )-y(^ +1 ))- 

So now, in the same situation as before (f err (t^ +1 ) > and 
D x (t k+1 ) > 0), we have a new term j(D x (t^ 1 ) - y(t k+1 )). 
This will be in general positive since the offset tends to further 
increase when the slave clock is slower and thus D x (t^ +1 ) — 

Thus, motivated by this discussion, we propose the follow- 
ing updating strategy: 

u x (t k ) = and u s (t k ) = K X D x (t k ) - n 2 y(t k ) (11) 

where n\ = n and n 2 = k + 7. Figure 4 shows how 
the proposed strategy is able to compensate the oscillations 
without needing to estimate the value of f err (t k ). The stability 
of the algorithm will depend on how k\, k 2 and p are chosen. 
A detailed specification of these values is given in Section 
IV-B. 

Finally, since we are interested in studying the effect of 
timing loops, we move away from the client-server configu- 
ration implicitly assumed in Section II and allow mutual or 
cyclic interactions among nodes. Each node i has its own 
clock with skew and maintains its own values of Xi(t k ), 
Si(t k ) and yi(tk). The interactions between different nodes 
is described by a graph G(V, E), where V represents the 
set of nodes (i G V) and E the set of directed edges ij; 
ij G E means node i can measure its offset with respect to j, 
Dfj(t k ) = Xj(t k ) -Xi(t k ). 

Within this context, a natural extension of (11) is to substi- 
tute D x (t k ) with the average of z's neighbors offsets. Thus, 
we propose 

Si(t k +i) =Si(t k ) + Kx^- V DfAt k ) - K 2 yi(t k ) (12a) 



(12b) 



where Mi represents the set of neighbors of i and di = \Mi\. 
The scalar c is a commit or gain factor that controls the offset's 
influence on the system. 

Under this framework, many servers can affect the final 
frequency of the system. Thus in general, when the system 
synchronizes we obtain 



Xi(t k ) = r*t k + x* i G V. 



(13) 



The final frequency r* and offset x* will depend on the initial 
condition of the different clocks and the graph topology. In 
general we will require the graph G(V, E) to be connected. 

Definition 1: We say that the graph G(V, E) is connected 
if there is at least one node i G V such that every other node 
j has a directed path to it. 



IV. Analysis 

In this section we analyze the asymptotic behavior of system 
(12) and provide a necessary and sufficient condition on the 
parameter values that guarantees convergence to (13). The 
techniques used are drawn from the control community, e.g. 
[5] and [16], yet its application in our case is nontrivial. 

Notation: We use mxn 0-mxn) t0 denote the matrices of 
all zeros (ones) within ^ rnxn and n (l n ) to denote the col- 
umn vectors of appropriate dimensions. I n G R nxn represents 
the identity matrix. Given a matrix A G R nxn with Jordan 
normal form A = PJP~ X , let ua^tl denote the total number 
of Jordan blocks J\ with I G 1(A) := {l,...,n^}. We use 
/j>i(A), I G {1, . . . , n} or just 11(A) to denote the eigenvalues 
of A, and order them decreasingly > • • • > |/x n (A)|. 

Finally, A T is the transpose of A, A^ is the element of the 
zth row and jth column of A and a\ is the zth element of the 
column vector a, i.e. a = [a^] T . 

It is more convenient for the analysis to use a vector form 
representation of (12) given by 



where z k := [x(t k ) T s(t k ) T y(t k ) T ] T and 



(14) 



T k :-- 



I n r k R 

— K\CL I n —Kil n 
p(-cL) Onxn (l-p)I n 



and L is the Laplacian matrix associated with G(V, E), 

-(di)' 1 tiijeE, 



1 and Lij 







otherwise. 



The convergence analysis of this section is done in two 
stages. First, we provide necessary and sufficient conditions 
for synchronization in terms of the eigenvalues of T k (Section 
IV- A) and then use Hermite-Biehler Theorem [17] to relate 
these eigenvalues with the parameter values (Section IV-B). 

A. Asymptotic Behavior 

We start by studying the asymptotic behavior of (14). That 
is, we are interested in finding under what conditions the series 
of Xi(t k ) converge to (13). 

We will assume WLOG that r k = r Vfe to simplify 
presentation. The proofs presented in this paper can be readily 
extended for the time varying r k . Thus, we will drop the k 
index of T k from here on. 

Consider the Jordan normal form [18] of T 



r = [Ci 



Csn] J [m 



V3n\ 



where J = blockdiag( Jz)/ G x(r> d an d Vi are me right and 
left generalized eigenvectors of T such that 



1 if j 



otherwise. 



The crux of the analysis comes from understanding the rela- 
tionship between the multiplicity of the eigenvalue ji(T) = 1 



and the eigenvalue n(L) = 0, and their corresponding eigen- 
vectors. 

Lemma 1 (Eigenvalues ofT and Multiplicity of ja(T) = 1): 
T has an eigenvalue /i(T) = 1 with multiplicity 2 if and only 
if the graph G(V, E) is connected, k\ ^ n 2 and p > 0. 

Furthermore, /i(T) are the roots of 

gi (X) := (A - 1) 2 (A - 1 +p) + [(A - + ^ 2 - «i]i/ z (15) 



where ^ = ^(rcLR) and satisfies 

i/ n = < H for I G {1, . . . , n — 1}. 



(16) 



Lemma 2 (Jordan Chains of fi(T) = 1): Under the condi- 
tions of Lemma 1 the right and left Jordan chains, (Ci 7 C2) 
and (772,771) respectively, associated with /i(T) = 1 are given 
by 



Ci = [i£ (£ o T n ] T ,( 2 = K {R ~ lln)T or T 



(17) 

r 

(18) 

where £ is the unique left eigenvector of n(L) = and a is 
the ^-weighted harmonic mean of r i7 i.e. 



77 2 = [0 T n rae -^ T ] T ,% = [(aR-^f T n T n 



p 



1 

a 



(19) 



The proof of Lemmas 1 and 2 can be found in the Appendix. 
We now proceed to state and prove our first convergence result. 

Theorem 1 (Convergence): The algorithm (12) achieves 
synchronization for any initial conditions if and only if the 
graph G(V,E) is connected, k\ ^ K2, p > and |/i/(T)| < 1 
whenever /i/(T) 1. Moreover, whenever the system syn- 
chronizes, we have 



a^-MO), r * = aJ2tM(0) ~ -VM)- (20) 



Proof: We first notice that whenever approaches 
(13) then 

lim x(t h ) - r*l n th = (21) 

h—too 

Sufficiency: Since we are under the assumptions of Lemmas 
1 and 2 we know that /i(T) = 1 has multiplicity 2 and a Jordan 
chain of size 2. Thus, the Jordan normal form of T is 

T 

T 

Oo 



J=[Cl-C3n] 



1 1 

1 

0(3n-2)x2 



J 2x(3n-2) 
J 



771 



where J has eigenvalues with |/i(f )| < p < 1. Thus, it follows 
that 



lim i^-rf -(Ki+C2)r/: 



= [Ci-C: 



,3nJ 







2x2 







(3n-2)x2 



J 2x(3n-2) 
J h 



(22) 



= 






(a) 



(b) 



(c) 



Fig. 5: Graphs with real eigenvalue Laplacians 



where the last equality follows since p < 1, i.e. 



J h 


< 


J 









h—too 



-> 0. 



Now, using (17) and (18) we get 
al^R- 1 



Civi = 



n 



Onxn 
Onxn 



nxn 
'nxn 
Onxn Onxn 



Onxn Onxn 



(Ki+C2)/7 2 T 



O n xn hral n £ T 

Onxn <xR _1 l n £ T 
Onxn Onxn 



T 1 



Therefore, given any initial condition z 
we have 

K2 



lim x(^) - t h al n € (s yo) 

p 



aln^i?- 1 ^ (23) 



and (20) follows from identifying (23) and (21). 

Necessity: The algorithm achieves synchronization when- 
ever (21) holds. Suppose by contradiction, that one of the 
conditions of the theorem does not hold. Then by Lemmas 1 
and 2 there is at least another Jordan block with |/i/ (r)| > 1. 
We have two cases: 1) If \pi (T)\ > 1, then the limits (22) and 
(21) do not exist (contradiction). 2) If |/i/ (r)| = 1, then the 
limit (22) might exist. However, since £^£1 =0 and £^£2 = 
the asymptotic contribution to x(th), s(th) or y(th) will be 
non-homogeneous among different members. Thus, (21) does 
not hold (contradiction). ■ 

B. Necessary and sufficient conditions of synchronization 

We now provide necessary and sufficient conditions on the 
parameter values for Theorem 1 to hold. We will restrict our 
attention to graphs that have Laplacian matrices with real 
eigenvalues. This includes for example trees, symmetric graphs 
(ij G E iff ji G E) and symmetric graphs with a root. See 
Figure 5. 

The proof consists on studying the Schur stability of gi(X) 
and has several steps. We first perform a change of vari- 
able that maps the unit circle onto the left half-plane. This 
transforms the problem of studying the Schur stability into 
a Hurwitz stability problem which is solved using Hermite- 
Beihler Theorem. 

Theorem 2 (Hurwitz Stability (Hermite-Beihler)): Given 
the polynomial P(s) = a n s n + ... + ao, let P r (u) and 
P 1 (uj) be the real and imaginary part of P(joo), i.e. 
P(jco) = P r (co) + jP i ((j). Then P(s) is a Hurwitz 
polynomial if and only if 
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1) a n a n _i > 

2) The zeros of P t (uj) and P 1 (uj) are all simple and real 
and interlace as uj runs from — oo to +oo. 

Proof: See [17]. ■ 
We now compute the parameter values. 
Theorem 3 (Parameter Values for Synchronization): Given 
a connected graph G(V, E) such that L has real eigenvalues. 
The algorithm (12) achieves synchronization if and only if 
(i) |1 — p\ < 1 or equivalently 2 > p > 



2ki 
3p 



> K\ — K2 > 

P(tt2-P(ttl-K 2 )) 



(ii) 

(iii) r < c/imax ( Kl _ p ( Kl _ K2 ))2 
where /i ma x is the largest eigenvalue of LR. 

Proof: We will show that when G(V,E) is connected 
with fi(L) G R, then (i)-(iii) are equivalent to the conditions 
of Theorem 1 . 

Since, G(V, E) is connected and (i)-(ii) satisfies p > and 
Ki 7^ k,2, the conditions of Lemma 1 are satisfied. Therefore 
the multiplicity of /i(T) = 1 is two and by (16) these are the 
roots of 

5 „(A) = (A-l) 2 (A-l+p), 

which corresponds to the case v n = 0. 

Thus, to satisfy Theorem 1 we need to show that the 
remaining eigenvalues are strictly in the unit circle. This is 
true for the remaining root of g n (X) iff (i). 

For the remaining gi(X), this implies that are Schur poly- 
nomials. Thus, we will show that gi(X) is a Schur polynomial 
if and only if (i)-(iii) hold. We drop the subindex I for the rest 
of the proof. 

We first transform the Schur stability problem into a 
Hurwitz stability problem. Consider the change of variable 
A = f±±. Then |A| < 1 if and only if R[s] < 0. 

Now, since v > by (16), let 



P(s) 



(S - l) 3 ^ ( 8 + 1 



Snpv 



4«i 
5np 



\onp 
4(2 -p) | 2#ci 
5npv 5np 



1 



where 5n = k\ — k^. 

We will apply Hermite-Beihler Theorem to P(s), but first 
let us express what 1) and 2) of Theorem (2) means here. 

Condition 1) becomes 

5np 



3 > 0. 



(24) 



Now let P t (uj) and P % (uj) be as in Theorem 2, i.e. let 



P>) =-w 3 + f 4 



,3-— V 
V^/^z/ ' 5 np J 



The roots of P t (uj) are given by ujq = ±a/^5 wnere 



4(2-p) , 2ki 
5 ftp 



(25) 



and the roots of P 1 (uj) by ujq G {0, ±y // cJ^} where 

4 4«i 
£ftz/ 5np 



uj n := 



(26) 



Since the roots P t (uj) and P*(cj) must be real, we must 
have ujq > and uj 1 > 0. Therefore, by monotonicity of the 
square root, the interlacing condition 2) is equivalent to 



< ujq < uj % . 



(27) 



Thus we will show: (i)-(iii) hold (24) and (27) hold. 

It is straightforward to see that using (i) and (ii) we can get 
(24). On the other hand, > from (27) together with (24) 

g ives < + 3 - f§ < which im P lies that s "> > °> 
and therefore (ii) follows. 

Now using (24) and (25), ujq > becomes 

4(2 -p) 2«i , n 
^ + -^-l>0 

which always holds under (i) and (ii) since the first term is 
always positive and - 1 > - 3 > by (24). 
Finally, using (25) and (26), ujq < ujq becomes 



4(2-p) 
&Kpv 



_|_ 2k i 

5kp 



2ki 
5kp 



2ki 
Skp 

i 



1 4 „ 

- < — +3 



5hcp 



4«i 



3 < 



1 



4«i 
Snp 

(2-p) 



2/ti 
5kp 



3 tfftp 5«z/ 
where the left-hand side (LHS) is 

(2^1 — Stzp)S)6tzp + (4^i — 3^p)(2^i — 3<5ftp) 



LHS = 



8(«? 



(2k1 — ^5np)5np 
2ki5kp + (£ftp) 2 ) 8(^i - (5^p) 2 



(2^1 — ?>5np)5np 
and the right hand side (RHS) is 

^ 2/^1 — 35/^p+(2— p)<5« 



Ship 



5nu 



2k\—35kp 
5kp 



(2k1 — ?>5np)5np 



8 ^2 — 

2^i — 35^p* 



Thus < becomes 

8(^i — £/q?) 2 
(2^1 — ^5np)5np 
(k>i — 5 rep) 2 



P 



< 



< 



8 ^2 — 



^kz/ 2ki 
1 



z^ < 



■(«2 

z^ 

P(^2 



3^^p 
5np) 
5np) 



(k,i — Skp) 2 



(28) 



Finally, z/; = ^(rcLR) = rcfjbi(LR). Thus, since (28) 
should hold V/ G {1, n — 1}, then 

which is exactly (iii). ■ 
Even though /i max depends on which is in general 
unknown, it is easy to show that /jh(LR) < r max /i^(L) where 
f max is an upper bound of the maximum rate deviation r^. 
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Furthermore, using Greshgorin's circle theorem, it is easy to 
show that fJL max (L) < 2. Therefore, if we set 

t < s P^- 6 ^ (29) 
2r max (^i - 5kp) 2 

convergence is guaranteed for every graph with real eigenval- 
ues. 

V. Experiments 

We implement an asynchronous version of our algorithm in 
C using the IBM CCT solution as our code base. Our program 
reads the TSC counter directly using the rdtsc assembly 
instruction to minimize reading latencies and maintains a 
virtual clock that can be directly updated. The list of neighbors 
is read from a configuration file and whenever there is no 
neighbor, the program follows the local Linux clock. Finally, 
offset measurements are taken using an improved ping pong 
mechanism proposed in [7]. 

We run our skewless protocol in a cluster of IBM Blade- 
Center LS21 servers with two AMD Opteron processors of 
2.40GHz, and 16GB of memory. As shown in Figure 2, 
the servers servl-servlO are used to run the protocol. The 
offset measurements are taken through a Gigabit Ethernet 
switch. Server servO is used as a reference node and gathers 
time information from the different nodes using a Cisco 4x 
InfiniBand Switch that supports up to lOGbps between any 
two ports and up to 240Gbps of aggregate bandwidth. This 
minimizes the error induced by the data collecting process. 

We use this testbed to validate the analysis in Section IV. 
First, we illustrate the effect of different parameters and ana- 
lyze the effect of the network configuration on convergence. 
(Experiment 1) Then we present a series of configurations 
that demonstrate how connectivity between clients is useful 
in reducing the jitter of a noisy clock source. (Experiment 2) 
And finally, we compare the performance of the algorithm 
with respect to the skew based algorithm proposed in [7] 
(Experiment 3). 

Unless explicitly stated, the default parameter values used 
are 



servl 



servl 



p = 0.99, «i = 1.1 and n 2 = 1.0. 



(30) 



Even though Theorem 3 allows p to have any value between 
and 2 the behavior of the algorithm is smoother when p G 
(0, 1), i.e. when yi(tk) is in fact a weighted average. 

Notice that these values immediately satisfy (i) and (ii) of 
Theorem 3 since 1-p = 0.01, ^ = 0.7407 > k x -k 2 = 0.1. 
The remaining condition can be satisfied by modifying r or 
equivalently c. Here, we choose to fix c = 0.7 which makes 
condition (iii) 



r < 



1.2717 



For fixed time step r, the stability of the system depends on 
the value of /i max . 

Experiment 1: We first consider the client server configuration 
described in Figure 6 (a) with a time step 




(a) 

Fig. 6: Effect of topology on convergence: (a) client- server 
configuration (b) Two clients connected to server and mutually 
connected 



In this configuration /i max = 1 and therefore condition (iii) 
becomes r < 1.2717s. Figure 7a shows the offset between 
servl (the leader) and serv2 (the client) in microseconds. There 
we can see how serv2 gradually updates s 2 until the offset 
becomes insignificant. 



-servl 
- serv2 
-serv3 



T = lS. 



(31) 



(a) Client server configuration: r = (b) Two clients mutually connected 
Is with r = Is 

Fig. 7: Offset of the nine servers connected to a noisy clock 
source 

Figure 7a tends to suggest that the set of parameters given 
by (30) and (31) are suitable for deployment on the servers. 
This is in fact true provided that network is a directed tree as 
in Figure 5 (a). The intuition behind this fact is that in a tree, 
each client connects only to one server. Thus, those connected 
to the leader will synchronize first and then subsequent layers 
will follow. 

However, once loops appear in the network there is no 
longer a clear dependency since two given nodes can mutually 
get information from each other. This type of dependency 
might make the algorithm unstable. 

Figure 7b shows an experiment with the same configuration 
as Figure 7a in which serv2 synchronizes with servl until 
a third server (serv3) appears after 60s. At that moment the 
system is reconfigured to have the topology of Figure 6 (b) 
introducing a timing loop between serv2 and serv3. This 
timing loop makes the system unstable. 

The instability arises since after serv3 starts the topology 
changes, setting /i max = 1-5. Thus, the time step condition 
(iii) becomes r < 847.8ms which is no longer satisfied by 
r = Is. 

Using (29) we can recover the stability of the system by 
setting 

1 2717 

r = 500ms < — s = 645.85ms 

Figure 8 shows how now serv2 and serv3 can synchronize 
with servl after introducing this change. 



Fig. 8: Two clients mutually connected with r = 500ms 



(a) Star topology (K = 0) 



(b) Complete subgraph (K = 4) 



Experiment 2: We now show how timing loops can be used 
to collectively outperform individual clients when the time 
source is noisy. As performance metric we choose the mean 
relative deviation from the leader, i.e. \/ r S^ where 

1 n 

S n = -V^-xx) 2 . (32) 

We disable the offset filter to observe the worst case scenario. 

We run our algorithm on 10 servers (servl through servlO). 
The connection setup is described in Figure 9. Every node is 
directly connected unidirectionally to the leader (servl) and 
bidirectionally to 2K additional neighbors. When K = then 



K=0 K=2 




Fig. 9: Leader topologies with 2K neighbors connection. 
Connections to the leader (servl) are unidirectional while 
the connections among clients (serv2 trhough servlO) are 
bidirectional 

the network reduces to a star topology and when K = 4 the 
servers serv2 through servlO form a complete graph. 

The dashed arrows in Figure 9 show the connections 
where jitter was introduced. To emulate a link with jitter 
we added random noise r] with values taken uniformly from 
{0, 1, Jitter max } on both direction of the communication. 

r] e {0, 1, Jitter max }ms (33) 

Notice that the arrow only shows a dependency relationship, 
the ping pong mechanism sends packets in both direction of 
the physical communication. We used a value of Jitter max = 
10ms. Since the error was introduced in both directions of the 
ping pong, this is equivalent to a standard deviation of 6.05ms. 

Figure 10 illustrates the relative offset between the two 
extreme cases; The star topology (K = 0) is shown in Figure 
10a, and the complete subgraph (K = 4) is shown in Figure 
10b. 



Fig. 10: Offset of the nine servers connected to a noisy clock 
source 




K 



Fig. 11: Effect of the client's communication topology on 
the mean relative deviation. As the connectivity increases (K 
increases) the mean relative deviation is reduced by factor of 
6.26, i.e. a noise reduction of approx. 8dB. 



The worst case offset for K = is 5.1ms which is on the 
order of the standard deviation of the jitter. However, when 
K = 4 we obtain a worst case offset of 690.8/is. 

The mean relative deviation \/~S^ as the connectivity among 
clients increases from isolated nodes (K = 0) to a complete 
subgraph (K = 4) is studied in Figure 11. The results pre- 
sented show that without any type of error filtering the network 
itself is able to perform a distributed filtering that achieves an 
improvement of up to a factor of 6.26 or equivalently a noise 
reduction of almost 8dB. 

Experiment 3: We now proceed to compare the performance 
of our algorithm (algl in the figures) with respect to the one 
proposed in [7] (alg2). Since alg2 performs some filtering, for 
the purpose of fairness we filter the offset measurements of 
algl by only accepting those whose round trip time are within 
twice the mean deviation. We use c = 0.35 and r = 250ras. 




2 4 6 8 10 °0 2 4 6 8 10 

Jitter mav Cms) Jitter maY Cms) 



(a) Mean relative deviation \/~S^ (b) Maximum offset 

Fig. 12: Performance evaluation between our solution (algl) 
and IBM CCT (alg2) 
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In Figure 12a we present the mean relative deviation 
for a star topology as the jitter is increased from Jitter max = 
Oms (no jitter) to Jitter max = 10ms with a granularity of 1ms. 
The worst case offset is shown in Figure 12b. While the mean 
relative deviation is slightly better for alg2, the maximum 
offset is better for our algorithm when the jitter is high. This is 
because extreme burst of jitter can introduce large errors in the 
estimation of the skew which makes alg2 steer significantly. 

VI. Conclusion 

This paper presents a clock synchronization protocol that 
is able to synchronize networked nodes without explicit es- 
timation of the clock skews and steep corrections on the 
time. Unlike current standards, our protocol is guaranteed to 
converge even in the presence of timing loops which allow 
different clients to share timing information and collectively 
outperform individual clients when the time source has large 
jitter. We implemented our solution on a cluster of IBM 
BladeCenter servers and report its performance. 
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Appendix 

A. Proof of Lemma 1 

Proof: We first compute the characteristic polynomial of 
T. In order to compute 

(A - l)I n -tR 
KicL (A - l)I n n 2 I n 
pcL (A-l+p)J* 

we will iteratively use the following property of the determi- 
nant of block matrices det(A) = det(Au) det(A\Au) where 

A A 11 A A 12 and A\Au = A 22 - A 21 A^A 12 is the 

A 21 A 22 J 

Schur^complement of An [18]. 



det(A/ 3 n - T) 



Thus, it follows 



r) = det((A 

(A-l)/ n 



det(A/ 3n - 

= (A-ir 

= det((A-l) 2 (A- 
+ (^2 - k\)\tcLK) 



l)/ n )det((A/ 3n -r)\(A 
(A-l+p)J n 

l+jp)/n + [(A-l)«i 



flaw, 



1=1 



where gi(X) is as defined in (15). 

Thus, A = 1 is a double root of the characteristic polynomial 
if and only if k,i ^ k 2 , p > and rcLR has a simple zero 
eigenvalue, i.e. (16). Now, since R is nonsingular (16) must 
hold for the eigenvalues of L as well, which is in fact true if 
and only if the directed graph G(V, E) is connected [16]. ■ 

B. Proof of Lemma 2 

Proof: We start by computing the right Jordan chain. By 
definition of Ci, (r - J)Ci = n . Thus, if fi = [x T s T y T ] T , 
then the following system of equations must be satisfied 

tRs = (a), — K\cLx — n 2 y = (b), — pcLx — py = (c). 

(34) 

Equation (34a) implies s = 0. Now, (34c) implies cLx = — y, 
which when substituted in (34b) gives (k 2 — Ki)y = 0. Thus, 
since k>i ^ k 2 , y = and x G ker(L). By choosing x = l n 
we obtain (j. Notice that the computation also shows that £i 
is the unique eigenvector of /i(T) = 1 which implies that there 
is only one Jordan block of size 2. The second member of the 
chain, ( 2 , can be computed similarly by solving (T — 1)^2 — 

Ci. 

The vectors r\ 2 and r\\ can be solved in the same way using 
r] 2 (T — I) = and rji(T — I) = r] 2 which gives (18). Equation 
(19) follows from imposing CiVi — 1- ■ 



